




This folder contains the codebook and the original raw data for the iREDS study.  The subdirectories contains all of the code for the Bayesian analysis of the single items and scales.  

The codebook can be found in the file "iREDS_codebook.PDF"

The original CSV data file sent by Erica on 02/07/2019 can be found in the file "ireds_dataset_originalfromErica02072019.csv."  In order to prepare the data for analysis, I modified Erica's file and created the main data file, "ireds_dataset_long.csv" by making the following changes:

1.  I created a LabID variable that assigns a number to each lab.  There is a one-to-one correspondence between LabID and PIID.

2. Some clean up: a) Erica's file had two lab members with the number "lm113" so for one of these I changed LMID to lm113a (in LabID=22).  b) there is a variable called ataManagement_YesNo that I have renamed to DataManagement_YesNo.  c) There are two variables named DMInvolved_YesNo but the second one was in the codebook spot for DMPractice_YesNo, so I changed the second name to that.

3.  For participants who did not respond to either the pretest or the posttest (so all values for that wave are missing) the analysis needs to have a separate record for the missing survey.  So I have added a row in each instance of a nonresponse, and set all of the survey responses for that record to missing, and I've created a new variable nonresponse that is 1 if the whole survey wave is missing and 0 otherwise.

4.  I created the ireds_dataset_wide.csv with the cleaned data.  

ireds_dataset_long.csv uses the "long" format where each respondent has two rows, one for the pretest and one for the posttest, and the variable Pretest is 1 for the pretest wave and 0 for the posttest wave.  ireds_dataset_wide.csv uses the "wide" format where each respondent has one row, and the pre- and post-test waves are recorded in separate columns.  Variables from the pretest wave have the t0_* prefix and those from the posttest have the t1_* prefix.  All of these files retain the structure where if a respondent did not respond to a survey then there remains a physical row or columns and these are set to missing.  

5.  Note that you have to delete all columns with text data for importing into Stata because there is some unknown formatting in some of the text columns that prevents the import from CSV to Stata format.  The columns with text in ireds_dataset_long.csv file are E, H, J, M, V, AC, AI, AW, BI, BU, DF, DH, DN, DV, DY, EE, GE, and GI.

6.  The file ireds_dataset_wide_v12.dta in version 12 Stata format is identical to the corresponding CSV file except it uses the ireds_wide.do file to create additional variables.

7.  Some observations about the structure of the raw data.  There are 63 respondents who failed to respond to either the pretest or the posttest, 9 of these are from the pretest and 54 are on the posttest.  Of the singles, 7 are PIs and 56 are lab members.  The response rate on the pretest is not dependent on assignment (94.4% response controls, 95.8% treatment Chi2=0.66) but there is weak evidence for dependence for the posttest (76.4% controls, 65.3% treatment, Chi2=0.10), although the substantive difference is not large.  There are 121 participants who responded to both waves and a total of 34 labs.  There are two labs where none of the members (PI or LM) responded on the posttest, LabID=4 and LabID=10.  The analysis retains these labs since they contain pretest data but the posttest responses will be imputed for everyone. 

